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1)  ABSTRACT 

Optimal  strategies  are  investigated  for  a  class  of  one-dimensional  search  processes  in  which  the  objec¬ 
tive  is  to  find  a  point  which  is  near,  but  not  beyond,  a  boundary  of  uncertain  location.  Problems  of  this 
type  are  encountered  in  the  analysis  of  mining  operations.  Upper  and  lower  bounds  for  the  optimal  ex¬ 
pected  payoff  are  derived,  and  the  optimal  search  strategies  are  described  explicitly  for  a  large  subclass  of 
these  processes.  Results  are  obtained  by  formulating  the  search  as  a  multistage  decision  process  and  using 
a  dynamic  programming  approach. 
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MORE  ON  A  CLASS  OF  OPTIMAL  SEARCH  PROBLEMS 


Warren  W  Willman 

Operations  Research  Branch 
Mathematics  and  Information  Sciences  Division 


Abstract:  Optimal  strategics  ate  investigated  for  a  class  of  one-dimensional  search  processes  in 
which  the  objective  is  to  find  a  point  which  is  near,  but  not  beyond,  a  boundary  of  uncertain  location. 
Problems  of  this  type  ate  encountered  !n  the  analysis  of  mining  operations.  Upper  and  lower  bounds 
for  the  optimal  expected  payoff  are  derived,  and  the  optimal  search  strategies  are  described 
explicitly  for  a  large  sub:lass  of  these  processes.  Results  are  obtained  by  formulating  the  search  as 
a  multistage  decision  process  and  using  a  dynamic  programming  approach. 


INTRODUCTION 

Optimal  policies  are  investigated  here  for  a  class  of  one-dimensional  adaptive  search  processes  in 
which  the  objective  is  to  find  a  point  which  is  near,  but  not  beyond,  a  boundary  of  uncertain  location. 
This  class  is  an  extension  of  a  class  of  similar  search  processes  examined  previously  by  the  author  (1). 
.Aside  from  theoretical  considerations,  this  extension  is  important  because  of  its  applications  to  certain 
mining  operations.  These  problems  share  some  features  of  those  studied  by  Derman  and  Ignall  (2),  but 
are  basically  different  because  the  main  question  here  is  where  to  search,  not  when  to  stop.  They  are 
also  basically  different  from  the  classical  search  problem  described  in  Koopman  (3),  where  the  objective 
is  to  locate  a  small  object,  at  least  approximately,  within  a  large  planar  region  of  uncertainty.  The  only 
applications  of  the  results  of  this  report  to  planar  searches  would  be  to  the  location  of  the  boundary  of 
a  large  planar  region,  where  the  uncertainty  of  the  boundary  location  is  small  compared  to  the  size  of 
the  region.  The  results  here  are  obtained  by  formulating  the  search  as  a  multistage  decision  process  and 
using  a  dynamic  programming  approach. 


A  SEARCH  PROBLEM 


The  search  process  considered  here  proceeds  sequentially.  At  epoch  i  (i  =  0, 1 , . . .)  a  searcher  has 
the  choice  of  terminating  the  search  or  selecting  the  median  m/  of  a  random  variable  y\  whose  distribu¬ 
tion  is  rectangular  with  width  T  >  0.  The  term  mt  represents  the  desired  search  point,  whereas  yt  is  the 
actual  search  location,  wnich  is  unknown  to  the  searcher.  The  random  variables  (yi  -  mi)  are  statisti¬ 
cally  independent,  but  each  has  the  same  distribution  width  T. 


If  the  search  is  terminated  at  epoch  A  >  0,  the  searcher  receives  a  return  J  such  that 


fo  .  if  A  =  0 

|G(W)  -A,  if  A  >  0 


NRL  Problem  B01-10;  Project  RR003O2-41-6152.  This  is  a  final  report  on  one  phase  of  the  problem.  Work  on  other 
phases  of  this  problem  i:  continuing.  Manuscript  submitted  August  18,  1972. 
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where 


(a  +  kx  ,  if  x  <  b 

0  ,  if  x>  b. 

Also,  k  >  u,  a  >  2  +  (l/3fc),  and  b  is  a  random  variable  with  a  symmetric  trapezoidal  probability  den¬ 
sity  of  the  class  shown  in  Fig.  1  such  that  the  lower  and  upper  midpoints  are  0  and  sq.  respectively, 

i 


(lower  midpoint)  (upper  midpoint) 

Fig.  1  -  A  class  of  symmetric  trapezoidal  probability  densities  b.  The  quantity  T  >  0  is  the  distribution 

width  of  the  random  variables  y. 


where  sq  >  T  anti  T  <  1  /3k.  Also,  b  is  statistically  independent  of  the/s.  The  quantity  G  represents 
the  gain  from  the  search.  The  cost  of  a  single  search  step  has  been  taken  as  unity,  without  loss  of 
generality.  The  random  variable  b  represents  a  random  boundary  location.  A  rectangular  density  fot 
b  on  the  interval  [0, sol  would  serve  equally  well  here,  but  that  would  entail  more  complicated  formulae 
in  the  following  analysis. 

At  decision  time  /,  the  searcher  knows  the  values  of  so.  T,  k,  a,  i,  and,  for  all  /  <  /,  the  search 
decisions  mj  and  the  corresponding  values  of  sgn {b-y/).  This  last  sequence  of  values  represents  a 
knowledge  of  the  side  of  the  boundary  b  on  which  the  previous  actual  search  locations  were.  The 
problem  investigated  in  this  report  is  that  of  finding  search  policies  which  maximize  the  (prior)  expected 
value  of  the  return  J,  As  usual,  a  policy,  or  strategy,  is  defined  as  a  decision  rule  which  determines  the 
searcher’s  action  as  a  function  of  the  information  available  to  him,  for  any  possible  realization  of  ’.he 
search  process,  and  for  whir''  the  search  terminates  with  a  probability  equal  to  1.  This  search  is  adapt¬ 
ive  in  the  sense  that  the  searcher’s  actions  depend  '•n  previous  search  results. 

The  search  problem  treated  here  is  a  modification  of  one  studied  in  a  previous  work  by  the 
present  author  (1);  it  differs  in  only  two  details.  In  the  previous  work  a  =  0,  and  the  value  of  the 
return  J  is 
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sup  G(y{) 

i<N 

if  N  >  0.  The  present  modifications  make  the  search  process  a  more  accurate  approximation  to  certain 
kinds  of  searches  actually  conducted  in  mining  operations.  Because  of  its  similarity,  however,  much  of 
the  analysis  of  the  previous  problem  can  be  carried  over  to  this  one.  The  results  are  also  of  a  similar 
sort  and  will  be  compared  with  those  obtained  for  the  other  search  problem  at  the  end  of  this  report. 


ANOTHER  FORMULATION 


It  is  convenient  at  this  point  to  define  the  following  four  sequences  of  random  variables: 
hi  =  min{so}u,{m/:.y/  >  b,  j  <  i} 

2 =  max{o}  <  b,  j  <  /} 

\  =  max {o}u <  b,  /  <  /} 


2/ 


1* 


sgn (6-y/);  I  =0,1,...;  z_,  =  0; 


where 


sgn(0)  =  1  . 

It  is  immediately  apparent  that  there  is  always  a  better  alternative  than  choosing  Ml;  outside  the  interval 
[6,' -  T,  hi+T] .  Search  policies  for  which  such  a  choice  is  possible  will  not  be  considered  further.  In 
addition,  we  temporarily  admit  only  policies  for  which  mi  is  always  in  the  interval  [2,+(l/3Jfc), 
hj-(ll3k)\  if  A/-8/  >  4/fc. 

It  can  be  shown  by  induction  that  it  is  possible  to  express  the  return  as 


A'-i 

J  =  ^  [a(z;-zw)  +  k(y,Zi-XKl  -2,)z/-i)  -  1  j 
i=0 


where  N  is  the  epoch  at  which  termination  occurs.  This  alternative  expression  for  the  return  makes  the 
search  process  amenable  to  a  dynamic  programming  analysis.  The  boundary  location  b ,  and  the  quan¬ 
tities  \j  and  zj_ j  serve  as  the  state  variables  at  epoch  i  in  this  analysis;  the  intended  search  points  m/ 
are  the  control  variables  and  the  search  “results"  z,  are  noisy  measurements  of  the  state.  The  b  compo¬ 
nent  of  the  state  is  static;  the  z,_j  component  is  known  exactly. 


STATE  ESTIMATION 

The  temporary  policy  restriction  ensures  that  the  points  0,  sq,  and  the  m’s  are  all  separated  by  a 
distance  of  at  least  T  as  long  as  hi  -  2,-  >  4 Ik.  By  using  this  fact  and  the  statistical  independence  of 
the  random  variables  (y/-m,),  the  usual  inductive  use  of  the  Bayes  Rule  shows  that  the  posterior 
probability  density  of  b  at  epoch  i  (given  the  data  available  to  the  searcher  at  that  time)  is  also  a 
symmetric  trapezoidal  density  of  the  class  previously  shown  in  Fig.  1 ,  whenever  this  condition  is 
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satisfied.  The  upper  and  lower  midpoints  of  this  conditional  density  are  h(  and  8/,  respectively.  The 
conditional  liensity  of  X,-,  given  b  and  the  data  at  epoch  is  also  determined  by  the  posterior  distribu¬ 
tion  of  b  unti.'r  these  circumstances,  namely  by  the  parameter  8,-.  Since  the  state  variable  z/_ i  is  known 
exactly  fro.  s  the  data  at  epoch  i,  it  follows  that  hi,  8/,  and  z,..\  are  sufficient  statistics  for  the  joint 
conditional  distribution  of  the  state  variables  for  the  portion  of  the  search  where  At,-  -  8/  >  4/A:.  It  is 
important  to  note  that  these  estimation  results  depend  on  the  fact  that,  for  any  i  >  j  >  0,  the  relation 
hj  -  8;  >  4 /k  =»  hj  -  8 j  >  Aik,  which  is  an  immediate  consequence  of  the  definitions  of  hj  and  8,-. 

THE  VALUE  FUNCTION 

Let  “ll  be  the  class  of  search  policies  which  satisfy  the  temporary  restriction  imposed  previously 
and  for  which  the  functional  dependence  of  the  action  at  epoch  i  on  the  available  data  is  determined 
uniquely  by  the  statistics  hj,  8/,  and  z/_i  for  all  i  such  that  hi  -  8/  >  4/A:.  Since  the  joint  conditional 
distribution  of  the  state  variables  is  also  determined  by  these  statistics  in  this  case,  and  since  the  values 
of  (yj-mj)  are  statistically  independent,  the  following  definition  is  unambiguous  for  such  a  policy. 

Definition:  For  h  -  (4 Ik)  >  8  >  0,  z  =  0  or  1,  and  ire'll,  the  quantity  L  ( i,t,h,z,jr )  is  defined 
as  the  conditional  expected  future  return  at  epoch  /  from  policy  n  given  that  8/  =  8,  h;-  =  h,  and 
z/_ j  =  z,  where  the  future  return  at  epoch  /  is  the  total  return  minus  the  return  that  would  result  from 
terminating  the  search  at  that  epoch. 

For  ire'll,  the  notation  ir(i,t,h,z)  is  used  to  denote  the  action  specified  by  v  at  epoch  i  for 
8/  =  8,  hj  =  h,  and  z,_i  =  z.  The  value  function  is  now  defined  as: 

Definition: 


Q(i,2,h)  =  sup  I(/,8,Ai, l,ir)| 
ire'll  | 

R(i,  i,h)  =  sup  L(i,i,h,0,it ) 
ire'll  > 


for  h-(A/k)  >  8  >  0;  i  =  0,1,... 


The  value  function  is  defined  in  terms  of  the  two  partial  functions  Q  and  R  for  conceptual  convenience. 
Intuitively,  Q  is  the  optimal  expected  future  return  if  the  last  search  point  was  below  the  boundary, 
and  R  is  the  optimal  expected  future  return  if  it  was  above  the  boundary. 


The  results  of  Stratonovich  (4)  imply  that  the  conditional  expected  future  return  for  an  optimal 
policy  at  a  given  epoch  of  any  realization  is  determined  by  the  conditions  1  probability  distribution  of 
the  state  under  those  conditions.  Therefore,  this  value  function  is  the  supremum  of  the  conditional 
expected  future  returns  for  all  search  policies  satisfying  the  restriction  imposed  in  an  earlier  section 
titled  “Another  Formulation”  (for  the  domain  of  definition  of  this  function).  In  particular,  it  is  the 
optimal  value  function  in  the  sense  of  Bellman  (5)  if  optimal  policies  exist  within  this  restricted  set 
of  policies.  Furthermore,  it  will  be  shown  later  that  no  optimal  policies  are  exciuded  by  this  restriction, 
so  this  value  function  has  these  pioperties  with  respect  to  the  class  of  all  admissible  search  policies. 


The  situation  is  more  complicated  if  hi  -  8/  <  4/A:  because  the  statistics  hi,  8/,  and  z,-_i  are  in 
general  no  longer  sufficient  to  determine  the  conditional  probability  distribution  of  the  state  variables. 
This  case  will  be  treated  separately. 
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THE  BELLMAN  EQUATION 

For  rrell  and  h~(4/k)  >  2  >  0,  the  additive  expression  for  J  and  the  statistical  independence  of 
the  (>'/  ~  imply  the  recursion 

{/(/,2,/j,Zjm,rr),  if  rt(i,i,h,z)  =  “search  at  m" 

0  ,  if  n(i,i,h,z)  =  “terminate  seaich” 

where 


a 


6/V®  +  +  a(z,— z,_,)  +  k(yizi-'Ki(\-  zi) z,_i )  -  l}  . 

h,=h 


z,_,=z 

m|=m 


Repeating  the  manipulations  performed  in  Ref.  1 ,  it  follows  from  the  Principle  of  Optimality  developed 
by  Bellman  (5)  that  the  value  function  satisfies  the  equations  (together  constituting  the  Bellman 
equation) 


Q(iXh)  -  max-< 


0, 

sup 


-  m 
2 


l*(m-2)  +  Q(i+  1  ,m,h)] 


kT 2 

12(/r-2) 


-  ! 


} 


R(i,i,h) 


SUp  {  Tzflkin-t)  +  G(i+  \,m,h)] 

8+U/3Jfc)<m  <*-(  1/3*)  <■ 


for  /i — (4/A)  >  2  >  0.  The  reason  that  R  cannot  be  zero  is  that  searching  at  m  =  l/3Jt,  then  at 
m  =  -1/3 k  (guaranteeing  that  z,+i  =  1),  is  always  preferable  to  terminating  the  search  at  epoch  i  if 
2f_>  =  0  because  a  >  2  +  (1/3 k).  From  these  two  equations,  it  follows  that 

Q(i,i,h)  =  max|0,R(i',2,/i)- a-fc2|  (1) 


in  this  range  of  2  and  h. 
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ADDITIONAL  PROPERTIES  OF  THE  OPTIMAL  VALUE  FUNCTION 

The  next  step  here  is  to  establish  some  additional  properties  of  the  partial  value  function  Q  which 
will  be  useful  in  the  analysis  of  this  search  problem.  These  properties  are  proved  in  this  section  as  a 
series  of  lemmata.  In  the  context  of  this  entire  section,  Q  is  the  partial  value  function  as  defined  in  an 
earlier  section  titled  “The  Value  Function”,  and  not  any  other  solution  to  the  Bellman  equation,  if  such 
exist. 


LEMMA  1.  ft  -  2  =  4/ft  =>  Q(i,  2.  ft)  =  0. 

Proof.  For  any  admissible  policy  and  any  realization  of  the  random  variables  of  the  search  process, 
the  future  return  at  any  epoch  i,  such  that  z,-_i  =  1  and  ft/  -  2/  >  4 Ik,  is  bounded  above  by  the  future 
return  from  this  policy  in  the  search  process  analyzed  in  Ref.  1  with  the  same  values  of  sq,  T,  k ,  and  /, 
because  the  future  return  is  the  same  if  z/v  =  1  and  exceeds  the  future  return  of  the  present  process  by 
a  if  z\  =  0.  Since  every  policy  which  is  admissible  here  is  also  admissible  in  this  other  search  process, 
and  since  the  optimal  expected  future  return  is  zero  in  the  other  process  if  hj  -  2/  =  4 Ik,  the  lemma  is 
verified. 

LEMMA  2.  2 (i>  h)  is  monotonic  in  (h- 2). 

Proof.  Suppose  that  ft- 2  >  4/k  is  increased  by  a  factor  c  >  1  for  some  value  of  /.  If  the  prob¬ 
lem  is  changed  so  that  the  value  of  T  is  also  increased  by  this  factor,  the  remainder  of  the  search  process 
at  epoch  i  for  the  corresponding  realization  is  merely  scaled  up  by  the  same  factor.  Thus,  for  any 
policy  net!  in  the  original  problem  giving  conditional  expected  future  return  M  at  this  point,  the  scaled-up 
policy  (such  that  cnij  always  replaces  mf)  is  admissible  in  the  scaled-up  problem  and  gives  a  return 
greater  than  M  for  each  realization,  and  hence  a  greater  conditional  expected  future  return  at  the  cor¬ 
responding  point.  Therefore, 


(20, 2, ft)  <  Qj(/,c2,cft) 

where  Q\  is  tt.e  corresponding  partial  value  function  for  the  scaled-up  search  problem.  Finally,  rince  an 
increase  of  T  is  a  degradation  of  search  data,  it  can  never  increase  the  supremum  of  a  conditional  ex¬ 
pected  future  return,  so  that 


QlO',c2,cft)  <  Q{i,c2,ch). 

Definition:  s*  =  inf  Is:  3  t\ft,2  3  s  =  ft  -  2  and  2(i,  2,ft)  >  01  . 

LEMMA  3.  Qfi,e,2  +  s*J  =  0. 

Proof.  If  s*  =  4 Ik,  this  is  true  by  Lemma  1 ;  if  not,  s*  >  4/ft.  In  this  case,  assume  that  G(t,2,2  +  s*) 
=  g  >  0.  Let  m *  e[2+(l/3ft),  2+s*-(l/3ft)]  be  such  that 

h  (2  +  s*-m*)(m*-2)  + [/?(/+ 1  a -ftfi]  ~~~  1  >“  ■ 

5  S  i  L  S 


Such  an  m*  exists  by  virtue  of  the  Bellman  equation  and  Lemma  1 .  It  also  follows  from  Lemma  1  that 
for  any  e  such  that  0  <  e  <  s*  -  4/ft, 
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<G(/,2,2+s*-e)  =  0. 


m*)-a-kZ  1 


-l -ILL 

\  12(s*-e)  / 


From  the  definition  of  m*,  however,  this  is  impossible  for  sufficiently  small  e  because  all  of  the  terms 
in  the  large  parenthesis  are  continuous  functions  of  e. 

LEMMA  4.  Q(i,  8,  C  +s)  depends  only  on  s. 

Proof.  Lemmata  1  to  3  imply  that  Q  satisfies  the  following  modified  Bellman  equation: 
Q(iXh)=  sup  \l(s)\- s~^(ku  -  -  1 1  •  /(*)  —■  e(/  +  l,e+n,2+s) 

i/3Jt<«<s-(im)  L  *  \  12(s  u) '  J  s 


+/(s)7<30'+1,2,2  +  m) 


where 


m  = 


|0,  for  s<s*' 
[l  ,  for  s>  s*  _ 


and  where  u  =  m  -  8.  Since  Q(i,i,&)  =  0  by  Lemma  1 ,  Q(/,  8, 8  +  s)  must  depend  only  on  s  to  avoid  a 
contradiction. 


A  SIMPLIFICATION 

By  Lemma  4,  the  following  definition  is  unambiguous: 

Definition'. 

[(?(/, 2,2 +s),  for  s>4/k 

ns) 

[0  ,  for  0<s<  4/k . 

Also,  by  the  proof  of  Lemma  4,  V  satisfies  the  equation  and  boundary  condition  given  by 

{  Ks)  f~ 1  (ku  -  -  ll  + 1 


F(s)  =  sup  <  /(s) 
l/3*<u<r-(l/3*)  [ 


*7*-  [ku  ~  T$h) )  -  1  j +  '<*>  ^  +  '<*>  f  K<"> 


(2) 


and 


K(0)  =  0 


where 
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0, 

for 

s  <s* 

1, 

for 

s>  s* . 

LEMMA  5.  There  exists  a  unique  solution  V  to  Eq.  2. 

Proof.  By  an  extension  of  Theorem  1  in  Chapter  IV  of  Bellman’s  book  (5),  there  exists  one  and 
only  one  solution  V  to  this  equation  such  that  K(0)  =  0  and  K(s)  is  continuous  at  s  =  0.  Since  all  solu¬ 
tions  clearly  have  these  two  properties,  the  lemma  follows. 

The  problem  of  finding  the  partial  value  function  Q  has  now  been  simplified  to  finding  s*  and 
finding  a  solution  to  Eq.  2.  The  following  result  is  helpful  in  determining  s*: 

LEMMA  6.  s*  =  sup  i  s  >0: R(0, 0,  s)<a\. 

Proof.  By  Lemma  4  and  Eq.  1 , 

(?(«',  8, ft)  =  G(0,0,s)  =  max!0,R(0,0,/i-fi)-ai . 

Therefore, 

Q(iAh)  >0**R(0,0,*)-a>0  and  s  =  h-H. 

By  definition, 

s*  =  infU>03/,fi,A  3s=h-i  and  QdAh)  >01 . 

Hence, 

s*  =  infis>0:3/,2,/t  3s=h~i  and  /?(0,0,s)>a| 

=  inffs  >0:R(0,0,s)>ai 
=  supis>0:/?(0,0,sj<ai, 

since  termination  is  nonoptimal,  Q(i,£,h)  is  r  onotonic  in  (h-if  and  R  =  Q  when  Q  >  0. 

OPTIMAL  POLICIES 

The  simplified  Bellman  Eq.  2  is  exactly  the  same  as  the  one  derived  for  the  search  process  studied 
in  Ref.  1,  except  that  the  parameter  s*  here  replaces  smjn  there.  The  results  there,  and  the  fact  that 
Q  =  R  for  Q  >  0,  imply  the  following  results  for  xreTl  (this  policy  restriction  is  shown  to  be  superfluous 
in  the  next  section): 

a.  If  r,_i  =  1  at  epoch  /,  it  is  optimal  to  continue  the  search  if  and  only  if  s/  >  s*. 

b.  It  is  always  optimal  to  continue  the  search  at  epoch  i  if  z/_i  =  0. 

c.  If  se(2"-1s*,2ns*],n  =  0,l,2, .... 
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then 


*  k  (  x  7'2  S  2nT2\  ^  (  -X^f1 

r  <!)  - 1  \J  *  a  -  v~  IT )  -  "<m  <7  ^'s  ♦  T  [l 


i  rimw 


where 


_  A 
s  = 


log2£ 

k 


+ 


+ 


li 

6 


d.  F+(s)  -  F~(s)  <  F+ft*)  for  s  >  s*/2. 

e.  V*(2rrs)  =  V~(2 ns)  =  V(2ns)  for  n  =  0, 1 , 2, . . . . 

f.  The  policy  n~,  where  m/  =  (/»,-/ 2)  +  (C//2) ,  is  optimal  if  so  =  2"  I,  n  =  1 ,  2,  ....  and  h/  -  £j  >  s*. 

g.  Cases  exist  where  /i/  -  £/  >  s*  and  m/  =  (fy/2)  +  (fi//2)  are  not  optimal. 


Furthermore,  if  T  =  0  the  conditional  probability  density  of  b  given  the  search  results  is  rectangular  for 
any  policy  in  'll,  so  the  Bellman  equation  can  be  extended  to  entire  search  process.  It  is  then  straight¬ 
forward,  but  tedious,  to  show  by  direct  substitution  in  the  Bellman  equation  that  the  policy 


r  o 


u, 


,  if  0<S/<1/* 


|  ~  ^  ,  if  1/*<J,  <  3/* 


4*/ -1/fc,  if  3/*<s,<(3  +  V§)7* 


is  optimal  for  s,-  <  s*  and,  by  Lemma  6,  that  s*  =  (3  +  V6)/fr  in  this  case.  If  T  >  0  it  is  possible  that 
admissible  policies  lead  to  posterior  probability  distributions  for  b  which  are  not  symmetric  trapezoidal; 
so  s*  and  optimal  policies  for  s  <s*  cannot  be  found  in  this  way.  Since  an  increase  of  T  from  zero  to 
a  positive  value  represents  a  degradation  of  information,  however,  the  quantity  (3  +\/6 )/k  is  a  lower 
bound  for  s*  in  this  case.  An  upper  bound  can  be  established  by  evaluating  the  expected  return  from 
the  admissible  but  nonoptimal  policy 


f-T 


if  -T<ht-ii<llk 


ut=< 


S1 

2 


1 

2k 


T  , 


if 


1  Ik  <  hi  -  lj  <  3/k 


3  Ik  <hi-ii<  6/k  . 


It  is  not  important  to  evaluate  this  return  exactly;  it  is  positive  if 


s,  =  3-l^  4 

*k 
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I 


Hence,  the  results  about  s*  obtained  here  can  be  summarized  as 

3  +  y/6  3  +  \/b  7 

— r—  <  s*  <  t  kT 


REMOVAL  OF  POLICY  RESTRICTION 

The  analysis  in  this  report  has  heretofore  been  based  on  a  restriction  of  admissibte  search  policies 
to  those  for  which  m/e  (2/+  1/3 k,  hj-l/3k]  always.  It  is  the  purpose  of  the  present  section  to  show 
that  this  restriction,  although  convenient  for  analytical  reasons;,  is  superfluous  with  regard  to  optimal 
policies.  In  particular,  it  is  shown  that  for  any  policy  not  satisfying  this  restriction  there  is  a  policy  that 
does,  and  one  which  gives  an  exper>:d  return  at  least  as  great.  Consequently,  the  restilts  pertaining  to 
optimal  search  policies  in  this  report  can  be  regarded  as  applying  to  all  search  policies  without  restriction. 

LEMMA  7.  If  Sj  >  4/k  and  T  <  I /3k  for  any  epoch  i,  it  is  not  optimdl  to  choose  m/  such  that  nij 
4fti+(l/3kJ,hr(l/3k)l. 

Proof.  Assuming  the  contrary,  the  Principle  of  Optimality  (5)  implies  the  existence  of  a  case 
where  so  >  4/k,  T  <  l/3k,  a  policy,  and  a  realization  of  the  search  process  such  that  mj  <  2/+ 1/3A:,  or 
ntj  >  hj— 1/3 k,  for  some  epoch  /,  and  such  that  this  policy’s  conditional  expected  future  return  at  that 
point  is  greater  thar  that  of  any  policy  which  terminates  then  or  for  which  +  i .'3k  <  ntj  <  hj-\/3k. 

Let  the  variable  triple  (so,  T,k)  be  fixed  such  that  this  possibility,  exists  and  let  n  be.the  set  of  all  such 
triples  with  this  property.  Define  the  set  B  as 

B  =  \x>4/k:  (x,T,k)eA\ . 

Let  o  be  an  element  of  B  such  that  o  <  inffi+  1/3*,  and  let  II  denote  ari  admissible  policy  for  which  the 
possibility  described  above  exists  for  the  triple  ( a,T,k ),  the  existence  of  which  is  guaranteed  by  the 
construction  of  o.  Consider  the  corresponding  search  process  and  realization  and  denote  by  i  the  first 
epoch  for  which  m/  <  2/  +  1/3 k  or  m/  >  /i,  -  l/3Ap.  For  convenience,  denote  /»,- 8/  by  s/  and  m,-fi/ 
by  up  All  probabilities  and  expectation  in  the  following  computations  are  meant  to  be  conditioned  on 
the  data  available  to  the  searcher  at  epoch  ».  i 

Since  i/  +  T<  2,  +  1/3A:  <  ntj  <  hj-  \/3k  <  hj-T  for  all <  /,  the  conditional  density  of  b  at 
epoch  i  is  symmetric  trapezoidal  with  upper  and  lower  midpoints  at  hj  and  2/.  By  construction, 

Sj>4/k  >  \  2T. 

Case  1 :  m,-  <  2/  +  1/3 k.  In  this  case,  the  searcher’s  conditional  expected  future  return  is  not 
decreased  by  giving  him  free  knowledge  at  epoch  /+ 1  of  the  random  variable  (l/2)-(l/2)sgn(f>-a+J), 
where  a=max(l/3fc,w(-  +  1/3 k)  and  f  is  a  random  variable  independent  of  b  and  then’s  with  rectangular  den 
sity  of  median  zero  and  width  T,  and  allowing  him  to  proceed  optimally  with  this  extra  knowledge.  At 
this  point  (epoch  r'+l)  the  conditional  density  of  b  is  symmetric  trapezoidal  by  construction  with  mid¬ 
points  less  than  inf(fi)  apart.  Hence,  the  optimal  expected  future  return  is  given  by  the  Bellman  equa¬ 
tion  for  policies  in  ll.  Some  tedious  computations  similar  to  those  in  Ref.  1  then  show' that  the  condi¬ 
tional  expected  future  return  from  policy  H  at  epoch  i  in  this  case  is  less  than  or  equal  to 

a  -  1  +  &(a+7)  +  Q(»',2/  +  a,A/) 


if  z/_i  =  0,  anf 


-1  +  k(a+T)  +  G(/,2/  +  a,A/) 
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if  z/_ i  =  1.  Since  Q(i,t,h )  is  monotonic  in  h-9.  and  is  greater  than  or  equal  to  /?(/,#,/»),  either  of 
these  possibilities  implies  that 

0  <  -1  +k(a  +  T-Zi)  <kT-  1/3, 
contradicting  the  original  assumption. 

Case  2:  m,- >  fy- 1/3*.  The  proof  in  this  case  is  similar,  except  that  the  “free”  extra  information 
given  to  the  searcher  at  epoch  i+l  is  the  random  variable  (l/2)-(l/2)  sgn(6-/3+f),  where /?=min 
(fy- 1/3*, -1/3*). 


DISCUSSION 

‘As  was  mentioned  in  the  section  titled  “Another  Formulation,”  the  search  problem  investigated  in 
tlus  report  is  similar  to  the  one  described  in  Ref.  1.  The  main  differences  in  the  results  here  are  the 
inclusion  of  a  new  state  variable  z/_i  in  the  formulation  of  the  search  as  an  optimal  control  problem,  a 

new  stopping  rule,  the  searching  strategy  for  s,-  <s*,  and  the  difficulty  in  finding  the  value  of  s*  (which  , 

corresponds  to  smjn  in  the  previous  problem)  for  T  0.  Hsq>  s *,  the  optimal  expected  return  in  this 

search  process  is  a  +  F(s0),  where  V  is  defined  by  Eq.  2.  Although  this  function  is  slightly  different 

from  the  function  called  V  in  Ref.  1 ,  it  has  many  similar  properties  (see  section  titled  “Optimal  Policies”) 

and  obeys  the  same  equation,  except  that  j*  *  smjn.  It  is  for  this  reason  that  the  certainty-equivalent 

policy  n~  is  optimal  for  s/  =  2 ns  in  this  search  process,  as  well  as  in  the  one  investigated  in  Ref.  1 . 
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